Quantum binary field inversion: improved circuit 
depth via choice of basis representation 



Brittanney Amento Martin Rotteler 

Florida Atlantic University NEC Laboratories America 

Department of Mathematical Sciences 4 Independence Way, Suite 200 

Boca Raton, FL 33431 Princeton, NJ 08540, U.S.A. 

bf eroz@f au . edu mroetteler@nec-labs . com 

Rainer Stein wandt 
Florida Atlantic University 
Department of Mathematical Sciences 
Boca Raton, FL 33431 

rsteinwa@f au . edu 

September 26, 2012 



Abstract 

Finite fields of the form ¥2™ play an important role in coding theory and cryptography. We show that the choice of 
how to represent the elements of these fields can have a significant impact on the resource requirements for quantum 
arithmetic. In particular, we show how the use of Gaussian normal basis representations and of 'ghost-bit basis' 
representations can be used to implement inverters with a quantum circuit of depth 0(m log(m)). To the best of our 
knowledge, this is the first construction with subquadratic depth reported in the literature. Our quantum circuit for 
the computation of multiplicative inverses is based on the Itoh-Tsujii algorithm which exploits that in normal basis 
representation squaring corresponds to a permutation of the coefficients. We give resource estimates for the resulting 
quantum circuit for inversion over binary fields ¥2™ based on an elementary gate set that is useful for fault-tolerant 
implementation. 



1 Introduction 

In quantum computing, arithmetic operations occur in a plurality of contexts [2,5,7 11 16,21 , 29 1 . Having good quan- 
tum circuits for arithmetic is indispensable for obtaining good resource estimates and efficient circuit implementations 
of more complex quantum algorithms. In view of the cryptographic significance, it is not surprising that a number 
of publications have already explored quantum circuits to implement finite field arithmetic, including (3 15 TTUTHJ . 



Important special cases are arithmetic operations in finite prime fields and finite binary fields (cf., for instance, |22|). 
While there is some common ground between the prime-field case and the characteristic-two case, there are also im- 
portant differences. In this paper we focus entirely on quantum circuits to implement arithmetic in fields of the form 

Interestingly, thus far the literature on quantum circuits for Y^m -arithmetic focuses completely on polynomial 
basis representations, and computing multiplicative inverses by implementing the extended Euclidean algorithm as 
discussed in fl5) appears to be the common choice. The cost of implementing inversion this way is significant as 
the resulting circuit has a size that is cubic in m. When realizing the group law on a binary elliptic curve as quantum 
circuit, the cost of this operation becomes apparent: in an earlier issue of this journal, Maslov et al. presented a solution 
to the discrete logarithm problem on binary elliptic curves JT7). An important technique for achieving quadratic depth 
with their solution was to bring down the number of finite field inversions to one. For the asymptotic analysis, the 
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quadratic depth of this single inversion is still as expensive as all other arithmetic operations combined. So when 
trying to improve on the discrete logarithm circuit presented in JT7) — which from a cryptanalytic point of view is 
desirable — reducing the complexity of binary finite field inversion is a natural first step. 

Our contribution. This paper presents linear-depth multipliers using a so-called ghost-bit basis and using Gaussian 
normal bases. Building on these multipliers, we describe an inverter for F| m of depth 0(mlog(m)) derived from a 
classical inversion algorithm by Itoh and Tsujii [12], using 0(mlog(m)) qubits. We hope that our work stimulates 
follow-up work on using different representations of finite fields in quantum circuits, and we expect that the circuits 
presented in this paper will be useful for speeding up the arithmetic for quantum algorithms for computing discrete log- 
arithms on elliptic curves, but also for other algebraic problems that can be tackled on a quantum computer, including 
hidden polynomial equations |5|, hidden shift problems [7,24, 28 1, and certain period finding task s fTT|[T6 , 29 1. 

For the fault-tolerant implementation of quantum circuits on several error-correcting codes |8 25 1 the elementary 
gate set consisting of all Clifford gates and the so-called T-gate is a preferable one. The T-gate is the local unitary 
diag(l, exp(27ri/8)). The actual complexity of a fault-tolerant implementation of T-gates is extremely high, hence it 
is preferable to reduce their number as much as possible. We show that in a Gaussian normal basis or a ghost-bit basis 
representation, an inversion over ¥2^ can be computed in a T-depth of 0(m log(m)) and using at most 0(m 2 log(m)) 
many T-gates. 



2 Preliminaries: finite fields F2™ 

Perhaps the most popular representation of finite fields F2™ is the use of a polynomial basis. In the following, we 
briefly review some basic facts about this representation as well as two alternatives — the use of a ghost-bit basis and 
of a Gaussian normal basis. All of these representations are known, and we claim no originality for this section. 

2.1 Polynomial basis representation 

Denoting by / = x rn + J2T=o ^ e F 2 

[x] an irreducible polynomial of degree m over the prime field F 2 , we can 
identify F2™ with the quotient ring F2[a;]/(/), and this identification forms the basis of a popular representation of 
binary finite fields. 

Definition 2.1 (Polynomial basis representation) 

With the above notation, let x° + (/), x 1 + (/), . . . , x 71 ^ 1 + (/) be the canonical ¥2-vector space basis o/F2[a;]/(/). 
In the polynomial basis representation, each a € F 2 ™ is represented by the unique tuple (ao, . . . , ai TO _i) £ F™ such 

that a = E£o 1 «i-( a;i + (/))■ 

Example 2.1 The polynomial x 4 + x 3 + x 2 + x + 1 € F 2 [x] is irreducible, and so the field with 16 elements can be 
identified with ¥2[x]/ (a; 4 + x 3 + x 2 + x + 1). Choosing f = x 4 + x 3 + x 2 + x + 1 in the above definition, in the 
polynomial basis representation, the tuple (1, 0, 1, 0) € Ff represents the field element x 2 + 1 + (/). 



In the current literature on quantum arithmetic for binary finite fields, the representation from Definition 2. 1 seems to 
be the only one considered. Beauregard et al. j3|, Maslov et al. p8| , and Kaye and Zalka [15| provide circuits for 
addition, multiplication and inversion using a polynomial basis. 

• Using one qubit per coefficient of a = J^i^Q 1 a i ' + (/))> adding \a) to an m-qubit input |/3) can be done in 
the obvious way with m CNOT gates, each conditioned on one of the o^. These CNOT gates operate on disjoint 
wires, and hence this adder can be realized in depth 1. 



• Building on a classical Mastrovito multiplier |19||20, 26], the multiplication of two m-qubit inputs \a) and \(3) 
can be realized in depth 9m + 0(1) using Toffoli gates. If the irreducible polynomial / is the all-one polynomial 
or a trinomial, m 2 — m — 1 gates suffice |18). 



• Computing the inverse of a non-zero a £ ¥2™ , using the extended Euclidean algorithm, can be implemented in 
depth 0(m 2 ) and 2m + 0(log(m)) qubits 1 15 IT). 
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In this paper, we will look at two different representations of binary fields which — from an algorithmic point of 
view — suggest an interesting alternative to the use of a polynomial basis. 



2.2 Ghost-bit basis representation 

Keeping the notation from above, suppose the irreducible polynomial / we use is the all-one polynomial x m + ■■■ + !. 



In this case, m + 1 is prime and 2 is a generator of the cyclic group F*„, + i (cf. 1 12 1). Then / divides x m+ + 1 = 
(x + 1) • (x m + ■ ■ ■ + 1) € F2[ac], and we can define the map 

¥ 2 [x]/(f) — ► ¥ 2 [x]/(x m+1 + l) 

EITo 1 + — ► E^o 1 «i^ i + (^ m+1 + 1 ) ' 

The map <f> may be seen as appending an extra (zero) bit to the coefficient vector of a polynomial basis representation 
of a £ ¥2[x]/(f). As detailed by Silverman [30] (who suggests to attribute the construction to Itoh and Tsujii 1 12]), 
instead of adding, multiplying, and inverting elements in ¥ 2 [x]/(f) directly, we can apply <fi to the operands, perform 
the needed additions, multiplications, and inversions in F2 [x]/ (x rn+1 + 1), and then map the result back into F2 [x]/ (/) 
by applying 

¥ 2 [x]/(f) ¥ 2 [x]/(x^ + l) m 

ES X («< + ^) •»' + (/) <— E™ ^-^ + (^ m+1 + i) ' 

Definition 2.2 (Ghost-bit basis representation) 

With the above notation, assume that 1 + • • • + x m is irreducible. In the ghost-bit basis representation, each a 
is represented by a tuple (ao, . . . , a rn ) <G F™ +1 such that (ao + a m , . . . , a m _i + a m ) is the polynomial basis 
representation of a using the irreducible polynomial 1 + • • • + x m . 

Thence, a conversion from the ghost-bit basis representation to a polynomial basis representation boils down to 
dropping the ghost bit and adding (XOR) it to the remaining m bits. In a quantum circuit, this translates into a single 
CNOT with multiple fan-out at the very end, provided we do not have to restore the initial |0) -value of the ghost 
(qu)bit. We note that for adding field elements alone, applying the map <j> has no advantage — but also no dramatic 
drawback. 

• Using one qubit per coefficient of a = El™=o a i ' + (x m+1 + 1), adding \a) to an (m + l)-qubit input \0) 
can be done in the obvious way with m + 1 CNOT gates, conditioned on the individual on. These CNOT gates 
operate on disjoint wires, and hence this adder can be realized in depth 1. 

To realize quantum circuits for multiplying and inverting field elements, we are interested in exploiting the following 
properties of ¥ 2 [x]/{x m+1 + 1): 

• Squaring corresponds to a shuffle of the coefficient vector: 

(m \ 2 m 

i=0 / i=0 

where 7r(i) = 2 • i mod (m + 1) for i — 0, . . . , m. 



Example 2.2 As noted in Example 2.1 the polynomial x 4 + x 3 + x 2 + x + 1 € F2[x] is irreducible, and so 
F 2 4 affords a ghost-bit basis representation: the above map <fi translates operations in F 2 4 into operations in 
F 2 [x]/(x 5 + 1). Applying (j> to x 2 + 1 + (x A + x 3 + x 2 + x + 1), we obtain x 2 + 1 + (x 5 + 1), ;'. e., the 



polynomial basis representation (1,0, 1, 0) from Example 2.1 translates into the ghost-bit basis representation 

(1,0,1,0,0). 

Form = 4, the permutation ir in Equation |2} is (0)(1, 2, 4, 3), so the ghost-bit basis representation of (x 2 + 1 + 
(a; 5 + l)) 2 is (1, 0, 0, 0, 1) — corresponding to x* + 1 + (x 5 + 1). Applying the map from Equation ([TJ, we obtain 
the corresponding polynomial basis representation (1, 1, 1,0) respectively x 3, + x 2 +x + (x 4 + x 3 + x 2 + x + 1). 
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• To multiply two elements a = J^iLo a i ' x% + (x m+1 + 1) and j3 = Y^iiLo A • £ l + (x m+1 + 1), the following 
formula for the coefficients of their product 7 = X)"=o 7« ' - tI + + 1) can be used: 

m 

7i = a j&(i-j) mod (m + 1) (3) 

J'=Q 

As explained in Section 3.1 below, in combination with an observation in 1 18 1, Equation Q yields a linear-depth 
circuit for multiplication in Fa [a:]/ (x m+1 + 1). 

Remark 2.1 The idea of a ghost-bit basis can be generalized to a representation with more redundancy — whenever 
the polynomial x n + 1 € Fa[x] has an irreducible factor f of degree m, then we can define a map </> analogously as 
above, using n — m 'ghost bits.' Geiselmann and Lukhaub [9] discuss the implementation of F 2 m -multiplication in 
such a representation with a classical reversible circuit. 



2.3 Normal basis representation 

The possibility of an inexpensive squaring operation will be of great benefit for the inversion algorithm below, and a 
natural type of field representation to be considered in this context is a normal basis representation. 

Definition 2.3 (Normal basis representation) 

Let 77 e F 2 m be such that {rj,rj 2 ,r] 2 , . . . , i] 2 } is an F 2 -vecfor space basis of 'F 2 "» . In a normal basis representation 
of ¥2™, we represent each a € F 2 ™ by the unique tuple (ao, <X\, • ■ ■ , a m _i) £ ¥ 2 n with a = X^c/ ai ' )■ 

A normal basis representation exists for every field F2™ of degree m > 1, and more background information on 
normal bases can be found in [14|, for instance. By construction, squaring in such a representation is just a cyclic 
shift, and addition can be implemented as bit-wise addition — just as in the case of a polynomial or ghost-bit basis 
representation. To ensure the availability of an efficient multiplication procedure, one often restricts to a particular 
type of normal basis, which exists whenever 8 \ m. In this paper we focus entirely on these so-called Gaussian normal 
bases; see also [6,13] for further background and proofs of the properties that are relevant for our purposes. 

Definition 2.4 (Gaussian normal basis) 

Assume that t > 1 such that p = tm + 1 is prime and the index of the subgroup generated by 2 € F* is coprime to m. 
Let a € F 2 mt be a primitive p-th root of unity, and let u G F* have order t. Then 




is a normal basis of ¥2™, commonly referred to as type t Gaussian normal basisjj 

The complexity of multiplication with respect to a Gaussian normal basis representation is reflected by its type t. 
The Digital Signature Standard ||22] Appendix D.1.3] offers several practical examples for (extension degree, type)- 
pairs of binary fields F 2 ™: (163, 4), (233, 2), (283, 6), (409, 4), and (571, 10). For cryptographic applications, one is 
interested in situations where the type t is small. Hence, in our analysis we regard t as a (small) constant. 

• Using one qubit per coefficient of a, adding \a) to an m-qubit input \(3) can be done in the obvious way with 
m CNOT gates, conditioned on the individual a^. These CNOT gates operate on disjoint wires, and hence this 
adder can be realized in depth 1. 

• Squaring corresponds to a cyclic (right-)shift of the coefficient vector: 

F 2m — > F 2m 

i=0 ' > l^i=Q mod m )V 

'The basis elements are known as Gauss periods of type [m, t), but we do not need this terminology here. 
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• With the notation from Definition 2.4 define F(l), F(2), . . 
i < m and < j < t. Then the representation (70, . . . , j m 

7« = 

( tm—l 

J2 a F(k+l) + i^F(p-k)+i > if 2 I t 

k=l 

tm-1 m/2 

&F(k+l)+ifiF(p-k)+i + J2 ( a k-l+iPk-l+^ +1 + Otk-l+f+iPk-l+i), if 2 \ t 



, F(p - 1) through F {Tv? mod p) = i for < 
1) of the product 7 = a ■ j3 can be computed as 



(4) 



k=l k=l 

for i = 0, . . . , m — 1 (with all indices being understood modulo m). 

Example 2.3 (Gaussian normal basis) For F 2 s f/iere exists a Gaussian normal basis of type t = 2 : we have 
p = 2 • 5 + 1 = 11, a«c/ 2 ;s a generator ofW*, so the index of the subgroup generated by 2 6 F* « certainly 
coprime to m = 5. Choosing u = 10 € F*j an element of order t = 2, we compute 



F(l) 


F(2) 


F(3) 


F(4) 


F(5) 


F(6) 


F(7) 


F(8) 


F(9) 


F(10) 





1 


3 


2 


4 


4 


2 


3 


1 






Now, from Equation Q/or f/ze general multiplication 7 = a • /3, we obtain 

li = a>i +i f3i + a 3+l (3 1+l + a 2 +if3 3+l + au+ifa+i + a± +i p i+i + 



at2+iPi+i + a-z+ifo+i + cti+if3 3+l + oti/3i + i 



(5) 



. , m — 1. 



/ori = 0, 

2.4 Computing multiplicative inverses with the Itoh-Tsujii algorithm 

With a field representation where squaring is inexpensive, looking at an exponentiation-based alternative to Euclid's 
algorithm for computing multiplicative inverses becomes worthwhile. For any a 6 Fjm, we have a 2 = 1, and 
hence or 1 = a 2 ~ 2 can be found by raising a to the power 2 rn — 2. The almost maximal Hamming weight of the 



latter makes a naive square-and-multiply implementation problematic. Happily, a technique by Itoh and Tsujii 1 12 1 
enables an efficient implementation of this exponentiation (see, e. g., fT0}[T2l[27][3T[). We begin by writing 



HW(m-l) 

m — 1 = 2 ki , where [log 2 (m — 1)J 

i=l 



ki> k 2 > ■■■ > k 



HW(t 



^-l)>0, 



and HW( ) denotes the Hamming weight. Now, for fixed a € FJ™ and for i > 0, we define /3j 



_1 . In particular, 



/?o = 1, /?i = a, and the inverse of a can be obtained as a -1 = (/3 m ~i) 2 . So once we know /3 m _i, only one final 
squaring is needed — which for a ghost-bit or a normal basis representation is just a permutation. To compute /3 TO _i, 
we exploit the fact that for all non-negative integers i, j the relation 



(6) 



Pi+j = Pi ■ P- 

holds. By repeatedly applying Equation (|6} with i = j, we see that computing all of /3 2 o, /32 1 , ■ ■ ■ , P2 k i requires no 
more than Llog 2 (m — 1)J multiplications in F^m and [log 2 (m — 1)J exponentiations by a power of 2. In a ghost-bit 
or a Gaussian normal basis representation, all occurring exponentiations are (a-independent) permutations, and as the 
multiplications are of the form f3j ■ {Pj) 2 ' , to save resources we will exploit that {Pj) 2 ' can be derived from j3 — there 
is no need to implement a general multiplier. 

Be ginning with /?2 fc i » use Equation ( |6| > to calculate /?2 fc i +2 fc 2 ^ud then iterate this process to obtain /?2 fc i -f 2 fc 2 +2 fc 3 » 
etc., until we finally reach /3 m „i = /3 2fcl+2fc2+ ... +2 fc HW ( m -i) ■ Hence, with P 2 "i , ■ ■ ■ , P 2 " aw(m -D being available, 
HW(m — 1) — 1 multiplications in FJjm and HW(m — 1) — 1 exponentiations by a power of 2 suffice to derive p m -\. 

Example 2.4 (Itoh-Tsujii inversion) For m = 7, we have m — 1 = 6 = 2 2 + 2 1 , so given an input a — p 2 o € F 27 , 
with 2 < [log 2 (6)J applications of Equation |6]) we can find /3 2 i and P 2 2. Then, with 1 = HW(6) — 1 additional 
application of Equation ([6]), we obtain P 2 2 +2 i. After a final squaring — which in the case of a ghost-bit or a Gaussian 
normal basis representation is just a permutation of coefficients — yields a^ 1 — P 2 2+2 i- 
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3 Multiplying in linear depth using ghost-bit and Gaussian normal basis 
representations 

For implementing the inverter discussed in the sequel, the multiplication of field elements plays a crucial role. As we 
are interested in Gaussian normal basis and ghost-bit basis representations, we begin by detailing linear-depth circuits 
for multiplication in each of these representations. 

3.1 Linear depth multiplication using a ghost-bit basis 

To multiply two (m + l)-bit inputs \a) and \f3) which represent field elements a, f3 £ ¥2™ in a ghost-bit basis, For- 
mula (|3jl immediately yields a circuit consisting of (m + l) 2 Toffoli gates: each individual product Oj-/3(j_j) mo( j (m+i) 
corresponds to a single Toffoli gate. Adopting an observation from ] 18) , we recognize that these (m + l) 2 Toffoli 
gates can be evaluated in linear depth: for fixed (i — 2j) mod (m + 1), the Toffoli gates to compute the m + 1 products 
a jfi(i-j) mod (m+1) (j — 0, . . . , m) operate on disjoint wires. Consequently, we can evaluate these m + 1 Toffoli 
gates in parallel, and iterating over all m + 1 possible values for (i — 2j) mod (m + 1), we obtain a multiplier of depth 
m + 1. This establishes the following result, which for the special case |£) = |0) yields a basic multiplier. 

Proposition 3.1 If a ghost-bit basis representation of ¥2™ is available, the multiplication \a) |£) 1— > \a) |£ + a/3) 
with a, /?, £ G F2™ can be realized in depth m + 1 with m 2 + 2m + 1 Toffoli gates. 

As a concrete example of a ghost-bit basis multiplier, let us apply the above proposition to the field with 16 
elements. 



Example 3.1 Consider the ghost-bit basis representation of¥ 2 i from Example 2.2 In this case, evaluating all terms 
ajj3u_j\ mo d 5 in order for (i — 2j) mod 5 = 0, 1, 2, 3, 4 yields a multiplier of depth 5, consisting of 5 • 5 = 25 Toffoli 
gates, as shown in Figure [7] 



Figure 1 : A ghost-bit basis multiplier for a ■ /3 £ F 2 4 
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Next, we consider the special case of computing products a ■ a 2 with a fixed j, as occurring in the Itoh-Tsujii 



algorithm described in Section 2.4 This variant of our multiplier takes as input the ghost-bit basis representation 
(«0) • • • , a m) £ F™ +1 of some a £ ¥2^ and a |0) -initialized to + 1-bit register, in which the ghost-bit basis rep- 
resentation (70,71, ■ ■ • , 7m) of 7 = a • a 2 ' will be stored. The total number of wires required is only 2 • (m + 1). 
As we are using a ghost-bit basis representation, squaring is a simple permutation, and more generally exponentia- 
tion by 2 r corresponds to a permutation. In particular, we can obtain the ghost-bit basis representation of a 2 from 
(«0, Q!i, ... , a m ) by reading out the individual entries in a different order. Hence, the following result confirms that 
the saving of m wires can be done without sacrificing the property of having linear depth. 

Proposition 3.2 If a ghost-bit basis for ¥2™ is available, then for any fixed r £ {0, . . . , m} the multiplication 
\ a ) \0 ^ \a) \£, + a ■ a 2 ) with a, £ £ F2m can foe realized in depth 2m + 2 iw/ng m 2 + m Toffoli and m + 1 
CNOT gates. 

Proof: Let a = X)"=o Q!ia ' 4 + ( 2;m+1 + 1) be a ghost-bit basis representation for a e . Then Equation (j2j) 
yields a 2r = YlT=o a it- r {i) xl + (^ m+1 + 1). and with Equation (JSjl we recognize the i th coefficient of a ■ a 2 ' as 

m 

7, = a J -a 7r -r ( ( i _ i j) mod ( m+ i)) (« = 0, . . . , to). 

As applying 7r can be seen as doubling modulo m + 1, applying 7r~ r translates into division by 2 r modulo m + 1. We 
may assume that 2 r 7^ 1 mod (to + 1), as otherwise r € {0, to}, and exponentiation with 2 r becomes the identity on 
F2™ . Then, for any fixed 'index sum' a £ {0, ... , to}, there are exactly to + 1 pairs (i, j) £ {0, . . . , to} 2 satisfying 

7r~ r ((i — j) mod (m + 1)) + j = a mod (m + 1). (7) 

Namely, for each i £ {0, . . . , to} we obtain a unique corresponding j £ {0, . . . , to} by solving the linear equation 

2~ r ■ (i — j) + j = & mod (m + 1) 

for j — at this we divide by 1 — 2~ r (mod to + 1) which is possible as 2 r 7^ 1. The subsequent argument shows that 
we can compute the m + 1 products aja 7r -''((i-j) mod (m+i)) f° r those (i, j)-pairs satisfying Equation |7| in depth 
2. By arranging our circuit such that the values a = 0, . . . , m are processed in order, we achieve the claimed overall 
depth of 2m + 2. 

Suppose we have two products aja 7r -r( (i _ J -) mod ( m+ i)) and afa^-r^,^ mod ( m +i)) satisfying 
7r~ r ((i — j) mod (m + 1)) + j = a = 7r~ r ((i' — j') mod (to + 1)) + f, 
then we may assume j 7^ j', as otherwise 

n- r ((i - j) mod (to + 1)) = ^{(i' - f) mod (to + 1)), 
and there is nothing to show. Consequently, the two gates evaluating the two terms 

a j a Ti- r ((i-j) mod (m+1)) an d Oj' a w -f((j/_j/) mo d (m+1)) 

have different target bits. We can evaluate these two terms in parallel whenever the intersection 

{j,ir~ r ((i - j) mod (m + 1))} n {j',n~ r ((i' - j') mod (m + 1)} 

is empty — in this case the corresponding gates operate on disjoint wires. To better understand the situation, let us define 
an undirected graph © with vertex set Z/(m + 1), so that vertex i + (m + 1) corresponds to the wire representing o^. 
We connect two vertices, whenever they serve as control bits for the same gate, i. e., we include the edges 

{j mod (to + 1), 7r~ r ((i — j) mod (m + 1)) mod (to + 1)} 
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for all i, j € Z/(m + 1) with 7r~ r ((i — j) mod (m + 1)) + j = a mod (m + 1). In particular, we obtain exactly one 
self-loop (j = a/2 mod (to + 1)). Instead of using the above description of the edges, we can equivalently include 
all edges 

{j mod (to + 1), a — j mod (m + 1)} 

for j 6 Z/(to + 1). Because a — (a — j) = j mod (m + 1), we see that the resulting graph & consists of to/2 
vertex pairs, each connected by two parallel edges, and one isolated point (namely a/2 mod (to +1)) with a self- 
loop, corresponding to a CNOT. Consequently, two colors suffice to color the edges in such a way, that no neighboring 
edges share a color. Now all gates corresponding to an edge with the same color operate on disjoint wires and hence 
can be evaluated in parallel. □ 

To illustrate the 'wire saving' offered by Proposition |3.2| let us again consider the field with 16 elements. 

Example 3.2 For r — 2, the permutation ir~ r corresponds to a multiplication with 2~ 2 = —1 mod 5, i. e., we have 
to find 

7,; =a Q!-i mod 5 + «lQ;(l-i) mod 5 + a 2^{2-i) mod 5 + a 3«(3-i) mod 5 + C^ a (^-i) mod 5 (* = 0, ■ • ■ , 4) . 

Using the condition 2 ■ j — i — a mod 5, each of the occurring 25 terms can be associated with a particular value 
of a: 

(7 = 0: ctQCt a , a\Ui±, a2«3, ct-$oi2, ot\a.\ 
a = 1: a^ctx, ct\CtQ, a^a,}, c^a^, a^a^ 
a = 2: a a 2 , ct\U\, a2<^0' ot^a^, a.\a^ 
a = 3: a a a 3 , o^a^, a^Qi. 0:40:4 
a = 4: aoa 4 , 0103, 0202, o 3 ai, 0:400 
The resulting graph for a — is shown in Figure^ 

Figure 2: Graph representing the term for a — as described in Example |3.2| The 2-coloring of the edges — where 
the different line styles indicate the colors — translates into a depth 2 circuit for this term. 




Each edge corresponds to one gate, and with the 2-coloring of the edges we obtain a depth 2 circuit for evaluating 
the terms associated with a — (and add them to the respective input/partial result ji). Applying a similar reasoning 
to the other a-values, we obtain a circuit of depth 10 for implementing the map \a) |£) h-> |a) £ + a • o 4 ^ for a, £ € 
F 2 4, as seen in Figure^ 
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Figure 3: A ghost-bit basis multiplier for a ■ a 2 € F 2 4 
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3.2 Linear depth multiplication using a Gaussian normal basis 

Assume F 2 ™ has a Gaussian normal basis of type t. Our multiplier takes as input the normal basis representations 
(«0) 011, ■ ■ ■ , «m-i) € Fj 1 and (/3 , • ■ ■ , /?m-i) S F™ of two elements a, (3 € F 2 ™, along with a 1 0) -initialized 
m-bit register, in which the normal basis representation (70, 71, ... , 7m- 1) of 7 = a ■ /3 will be stored. Consequently, 
the total number of wires is 3m. Each coefficient product ajftk m Equation Q can be realized with a Toffoli gate, and 
so for a fixed i £ {0, . . . , m — 1} we can compute 7$ with at most 

tm — 1 consecutive Toffoli gates , if t is even 

tm — 1 + 2 • (m/2) = (t + l)m — 1 consecutive Toffoli gates , if t is odd 

From this we immediately obtain an overall gate count of (t + (t mod 2)) • m 2 — m Toffoli gates for our normal basis 
multiplier. This multiplier can be realized in linear depth: fix an arbitrary k g {1, . . . , tm — 1} and two different 
positions i, i' G {0, . . . , m — 1} in the normal basis representation of the product 7 = a ■ ft. Then the Toffoli gates 
computing aF(k+i)+iPF( P ~k)+i an d a F(k+i)+i' f^F( P ~k)+i' operate on disjoint wire sets, as obviously 

F(k + l)+i ^ F(k + l)+i' (mod to) and 
F(p— k)+i F(p—k)+i' (mod to). 

For odd t, we see analogously that ctk-i+iftk-i+^+i can be calculated in parallel with ctk-i+i' ftk-i+^+i' for all 
i 7^ i', and ak-i+n±+ifik-i+i can be calculated in parallel with ak-i+f+i'fik-i+i' for all i ^ i' , as summarized in 
the following result. 

Proposition 3.3 If a Gaussian normal basis of type t is available for ¥2™-, the multiplication \a) \0) |£) 1— > \a) |/3) |£ + aj3) 
of two field elements a, f3 G F 2 m can be realized in depth (t + (t mod 2)) ■ to — 1 using (t + (t mod 2)) • to 2 — to 
Toffoli gates. 

As a concrete example of a Gaussian normal basis multiplier, let us apply the above proposition to the field with 32 
elements. 



Example 3.3 Consider the type 2 Gaussian normal basis from Example 2.3 Here the product 7 = a- /3 of a, ft £ F 2 s, 
is represented by (70, . . . , 7m— l) with 

1% = ai+ift + a> 3+l /3 1+i + a 2 +i/3 3+ i + a 4+i /3 2 +i + a4 +i f3 4+i + 

+ Oi 3+i f3 2 +i + a l+iPs+i + a iftl+i- 

Implementing this summation term by term yields a normal basis multiplier for F 2 s comprised of 9 ■ 5 = 45 Toffoli 
gates and of total depth 9 (each term of the summation can be evaluated in parallel for i = 0, .. .,4), as seen in 
Figure^ 
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Figure 4: A Gaussian normal basis multiplier for a ■ (3 € F 2 5 
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Similarly, as in the case of a ghost-bit basis representation, it is possible to compute products of the form a ■ a 
in linear depth without having a 2 represented as a separate input. Hence, the following result shows that the saving 
of m wires can be done without sacrificing the property of having linear depth. 

Proposition 3.4 If a Gaussian normal basis of type t is available for ¥2^, for any fixed r G {0, . . . , to} the mul- 
tiplication \a) |£) 1 — ^ I a) |£ + a ■ a 2 ) for a e F 2 ™ can be realized in depth 3 ■ [t + (t mod 2)) • to — 3 using 
(t + (t mod 2)) ■ m 2 - m gates (CNOTor Toffoli). 

Proof: Using Equation Q to calculate the product a ■ a 2 ' again, the upper bound for the total number of gates 
remains unchanged. It could happen, however, that the control bits of a Toffoli gate end up on the same wire, so that 
instead of a Toffoli we obtain a CNOT gate. 

To argue that the circuit depth grows at most by a factor of 3, we fix k £ {1, . . . , tun — 1} arbitrary. Then 
/3ft = Oik-r, and we claim that all to terms 



a F(k+l)+iPF(p-k)+i — a F(k+l)+i a F(p-k)-r+i 

can be calculated in parallel using depth at most 3. 



(i = 0, . . . , to — 1) 



(8) 



Case F(k + 1) = F(p — k) — r (modm): Here, instead of Toffoli gates, we have only CNOT gates operating on 
disjoint wires. Hence, all to terms can be computed at the same time, L, e., in depth 1. 

Case F(k + 1) ^ F(p — k) — r (modm): For i ^ i', we can evaluate the terms 

a F(fc+l)+i a F(p-fc)-r'+i an d OlF(k+l)+i' a F(p-k)-r+i' 

in parallel whenever the two sets 

{F(k + l) + i,F(p-k)-r + 1} and {F(k + 1) + i',F(p - k) - r + i'} 
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have an empty intersection, meaning the two Toffoli gates operate on disjoint wires. We define an undirected 
graph & with vertex set Z/(m) — so vertex i + (to) corresponds to the wire representing a, ( mo d m \ — and edge 
set 

E := {{F(k + 1) + i (modm),F(p — k) — r + i (modm)} : i = 0, . . . , m — 1} , 

i. e., each edge corresponds to one Toffoli gate. If we can find an edge coloring of this graph such that neighboring 
edges always have different colors, then all Toffoli gates corresponding to the same color can be calculated in parallel. 
We show that 3 colors will be sufficient, and hence a depth 3 circuit suffices to compute all the products in ([8]). For 

S = F(p — k) — r — F(k + 1), let (5) be the cyclic subgroup generated by S + (m) in Z/(m), and let 



Z/(m) = Gx l±l • • • ttl G t 
be the decomposition of Z/ (to) into (<5)-cosets. Rewriting the edge set E as 

E = {{i (modm), i + 6 (modTO)}|i 6 {0, . . . 



(9) 



1}}, 



we see that the decompositon |9]) actually yields a decomposition of the graph & — there are no edges between vertices 
in Gj and Gy if j ^ j' . Moreover, (6) is cyclic with generator S + (to), so for each Gj, the subgraph of with 
vertex set Gj is a closed cycle on ord(<5 + (to)) vertices. As such, we may alternatively color the edges in such a cycle 
red and blue. Then neighboring edges can only obtain the same color at the very last step when we try to close the 
cycle — this happens whenever ord((5 + (to)) is odd. Hence, for the last edge in a cycle, a third color may be needed. 
As there are no edges between the individual cycles, we have found the desired 3-coloring of E. 

The above argument takes care of all even i-values, and for odd i-values the first of the summations in Equa- 
tion Q is taken care of as wellj^j To argue that for fixed k the terms ak-i+icek-i+^ -r+i (i = 1, . . . ,to) and 
ctk-i+?i+iOik-i-r+i (i = 1, . . . ,to) can be computed in depth 3, we can use an analogous argument as above, 
replacing 5 with (to/2) — r and (to/2) + r, respectively. □ 



Example 3.4 Sticking with the Gaussian normal basis representation q/F 2 s from Example 3.3 let us consider the 
special case of a multiplication 7 = a ■ j3 where /3 = a 2 , i. e., r = 1. Then we have = a^_i and Equation ^ can 
be rewritten as 7, — 

ai+t"4+i + a 3+i a l + a 2 +ia 2 +i + a4+iai+i + a4 +i a 3+t + a 2 +ia 3+l + a 3+i a>i +i + a 1+l a 2 +i + oaon- 

In particular, the addition of the terms a 2 +ia 2 +i and tti&i can be implemented with CNOT instead of Toffoli gates, 
fulfilling the condition F{k + 1) = i^(ll — k) — 1 as in the first case of the above proof. We also note that the 
underlined terms cancel each other, which yields a simplification of our circuit that is not reflected by the upper 



bounds in Proposition 3.4 



Going through the remaining values for k (for which F(k 
obtain the following values 8 = F(ll — k) — 1 — F(k + 1): 



1) ^ F(ll — k) — 1 and no cancellation occurs), we 



k 


2 


5 


6 


7 


8 


s 


-3 


-1 


1 


-2 


1 



As to = 5 is prime, each S + (5) generates the complete additive group Z/(5), and so the graph is simply a closed 
cycle. For instance, consider k = 5 such that 5 = — 1. Then the graph in Figure^is obtained, where a vertex labeled 
i (i = 0, . . . , 4) represents the residue class i + (5), and different line styles indicate different colors. 

As shown in Figure^ this 3-coloring translates into a quantum circuit of depth 3 to compute the terms on+ia 3 +i 
(i = 0, ... ,4) (and add them to the respective input/partial result 7J. 



2 For ord(<5 + (in)) = 2 the sets Gj consist of two vertices, and we actually face graphs with a 2-coloring of the edges. 
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Figure 5: Graph corresponding to the cosets S + (5) for S = —1 as described in Example 3.4 The 3-coloring of 
the edges — where the different line styles in the pentagon indicate the three different colors — translates into a depth 3 
circuit. 




Figure 6: Part of a Gaussian normal basis multiplier for a ■ a 2 G F 2 s: computing the terms ct4+ia^ + i 
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4 Inversion in depth 0(m log(m)) using the Itoh-Tsujii algorithm 

With the linear depth multipliers from the previous section, we can now implement a depth 0(mlog(m)) algorithm 
to invert field elements a € FJjm , if a Gaussian normal basis or ghost-bit basis representation is available. 

The first part of the input is, respectively, an m or (m + l)-bit representation \a) of the element a € Fj m to be 
inverted^] Now, providing Llog 2 (m — 1)J auxiliary registers that are initialized with |0), a sequence of [log 2 (m — 1)J 
consecutive multipliers can be used to calculate the values (3 2 o , P2 1 , ■ ■ ■ , /?2 fc i fr° m Section |2.4| — recall that (3 2 « = 
a. From Proposition 3.2 and Proposition 3.4 we obtain the following resource counts for this part of the inverter 
computation: 



• If a ghost-bit basis representation of ¥ 2 m is available, we can find all of f3 2 a , f3 2 i 
1)J • (2m + 2) using |_log 2 (m — 1)J • (m 2 + m) Toffoli and Llog 2 (m — 1)J • (m 
(1 + \\og 2 (m — 1)J) • (to + 1) qubits suffice. 



..,/8 2 *i in depth [log 2 (TO- 
1) CNOT gates. In doing so, 



Assume that a Gaussian normal basis representation of F 2 m is available. Then we can find all of j3 2 a , f3 2 i 



2 1 



in depth [_log 2 (m - 1)J ■ (3 • (t + (t mod 2)) • to - 3) using |tog 2 (TO - 1)J • ((* + (t mod 2)) 
(CNOT or Toffoli). In doing so, (1 + |_log 2 (m — 1)J ) • to qubits suffice. 



m 



to) gates 



3 The input |0) for \a) results in the output |0) as 'inverse.' 
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At this point, our inverter has computed all of /3 2 o , f3 2 i , • ■ ■ , /? 2 &i an d stored each of these values in a separate set of 
wires. Next, we can use a sequence of HW(m — 1) — 1 (general) multipliers, each obtaining an auxiliary input |0), to 
gather the actually needed values /3 2 *=i , ■ ■ ■ , (3 2 k llw(m _ 1) and form their product using Equation |6|. All exponentiations 

of the form /?| are for free, in that a multiplier can just read out the coefficients of the respective (3j in permuted order 
to obtain the required input value. This is simply a permutation of the control bit positions. Consequently, we have the 
following resource counts: 

• If a ghost-bit basis of F 2 m is available and given |/3 2 fci ), ■ ■ ■ , |/3 2 fc H w( m -i) )> we can compute |/3 m _i) in depth 
(HW(m — 1) — 1) ■ (m + 1) using (HW(m — 1) — 1) ■ (m 2 + 2m + 1) Toffoli gates. For the auxiliary inputs 
|0) respectively storing some intermediate results, (HW(m — 1) — 1) • (to + 1) qubits suffice. 

• If a Gaussian normal basis of F 2 m is available and given \/3 2 h ),■■•, \P 2 k nw( m -i) )> we can compute |/3 m _i) in 
depth (HW(to - 1) - 1) • ((t+ (t mod 2)) • to - 1) using (HW(to - 1) - 1) • ((t + (t mod 2)) • to 2 - to) Toffoli 
gates. For the auxiliary inputs |0) respectively storing some intermediate results, (HW(m — 1) — 1) • m qubits 
are needed. 

The final squaring operation in the Itoh-Tsujii algorithm is again for free, in that the last multiplier can simply write 
out the result in permuted order. In summary, we obtain the following estimate for a ghost-bit basis, where we double 
depth and gate count to account for the resources to 'uncompute' auxiliary values — this is an upper bound, as the last 
multiplication actually does not have to be 'undone.' 

Proposition 4.1 If a ghost-bit basis for F 2 ™ is available, the inversion \a) |0) i-> |0) can be implemented in 

depth 2 • [log 2 (TO- 1)J ■ (2m + 2) + 2 • (HW(m - 1) - 1) • (to + 1) = 0(mlog 2 (m)) and using 2 • [log 2 (TO- 1)J • 
(?71 2 + TO ) + 2- (HW(m-l) - 1) ■ (to 2 + 2to + 1) Toffoli and 2- [_log 2 (m-l)J • (to+1) CNOT gates. The inversion 
can be implemented with (1 + [log 2 (?n — 1)J) • (to + 1) + (HW(to — 1) — 1) • (to + 1) = 0(ralog 2 (ra)) qubits. 

Analogously, adding the respective bounds for the case of a Gaussian normal basis of type t yields the following 
estimate. If we consider t as constant, the depth of the resulting circuit is again in 0(to log 2 [m)). 

Proposition 4.2 If a Gaussian normal basis of type t for F 2 ™ is available, the inversion \a) |0) i-> \a _1 ) |0) can be 
implemented in depth |_log 2 (m- 1)J • (6 •(*+(* mod 2)) • m- 6) +2 • (HW(m- 1) - 1) • ((t+ (t mod 2))-m-l) = 
0(mlog 2 (m)) using 2 ■ [log 2 (to- 1)J •((*+(* mod 2)) • m 2 - to) + 2 • (HW(m- 1) - 1) • ((t + (t mod 2))-to 2 -to) 
gates ( CNOT or Toffoli). The inversion can be implemented with (1 + Ll°g2( TO — 1)J ) ' m + (HW(m — 1) — 1) • to = 
0(mlog 2 (TO)) qubits. 

It is worth noting that if our extension degree to has the form to = 2" + 1, e. g., for to being a Fermat prime, 



the Hamming weight of to — 1 is one, i. e., we can restrict to special multipliers as described in Proposition 3.2 and 



Proposition 3.4 entirely. As in the general case, the last multiplier can output the result /3 2 m-i in permuted order, so 
that the correct inverse (P m ~i) 2 is obtained without the need to implement a squaring operation. 

Avoiding such a special case, the following example illustrates the structure of the discussed inverter with an 
extension of degree 7, where a general multiplier with two arguments is brought to use. 

Example 4.1 Consider the field F 2 7 we discussed in Example \2.4\ and assume a Gaussian normal basis representation 
is used. Then, to compute oT 1 from an input a = /3 2 n £ F 27 , we can use two special multipliers as described in 



Proposition 3.4 to compute /3 2 i and /3 2 2. Interpreting the input wires in appropriately permuted order, one general 



multiplier suffices to compute /3 2 2 +2 i. In addition, writing the output in appropriately permuted order, the output of 
this multiplier is actually /3 22+2 i- 

Representing an m-qubit input by a single wire, the structure of the resulting inverter in F 27 is summarized below 
in Figure^ 



Finally, we obtain as direct consequence of Propositions 4. 1 and 4.2 the following corollary which gives an upper 



bound on the number of T-gates to perform inversion in a binary finite field where a ghost-bit basis or a Gaussian 



normal basis representation is available. This is a straightforward consequence of a realization |23 Chapter 4.2] of a 



Toffoli gate using 7 T-gates (or -gates which we assume to have the same cost) in a circuit of overall T-depth of 6. 
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Figure 7: A ghost-bit or Gaussian normal basis inverter for a € F 27 
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Corollary 4.1 If a ghost-bit basis for F 2 ™ is available, an inverter can be implemented with a T -depth of at most 
12 • Llog 2 (m — 1)J • (2m + 2) + 12 • (HW(m — 1) — 1) • (m + 1) and using no more than 14 • |_log 2 (m — 1)J ■ (to 2 + 
to) + 14 • (HW(to - 1) - 1) • (to 2 + 2to + 1) many T-gates. 

If a Gaussian normal basis of type t for F 2 ™ is available, an inverter can be implemented with a T -depth of at 
most 6 • \ \og 2 {m - 1)J • (6 • (t + (t mod 2)) • to - 6) + (12 • HW(m - 1) - 6) • ((* + (t mod 2)) • to - 1) using at 
most 14 • |log 2 (TO - 1)J • {(t + (t mod 2)) • to 2 - m) + 14 • (HW(m - 1) - 1) • ((t + (t mod 2)) • to 2 - to) many 
T-gates. 

5 Comparison and conclusions 

The above discussion demonstrates that the use of representations of finite fields other than a polynomial basis, can en- 
able efficient and elegant quantum circuits for realizing binary finite field arithmetic. Table[T]gives a brief asymptotic 
comparison of the circuit depth of the representations discussed here in comparison to a polynomial basis represen- 
tation. For a Gaussian normal basis representation the exact depth increases when the type t gets larger, but for 
cryptographic purposes already a value of t — 10 is unusually high, and small values like t = 2 or t = 4 are more 
typical; here, we consider t as a (small) constant. 



Table 1: Circuit depth of F 2 ™ -operations for different representations 





Addition 


Multiplication 


Inversion 


polynomial basis 


3 


15 


18 


0(1) 


0(m) 


ext. Euclidean alg.: 0(m 2 ) 


ghost-bit basis 


0(1) 


0(m) 


Itoh-Tsujii alg.: 0(mlog(m)) 


Gaussian normal basis 


0(1) 


0(m) 


Itoh-Tsujii alg.: 0(mlog(m)) 



Table [2] gives an asymptotic comparison for the number of gates involved. Again, for Gaussian normal bases we 
consider the type t as a (small) constant. 



Table 2: Number of gates when implementing F 2 ™ -operations for different representations 





Addition 


Multiplication 


Inversion 


polynomial basis 1 3 15,181 


0(m) 


0(m 2 ) 


0(to 3 ) 


ghost-bit basis 


0(m) 


0(to 2 ) 


0(to 2 log(m)) 


Gaussian normal basis 


0(m) 


0(to 2 ) 


0(to 2 log(m)) 



Overall, a main feature of the presentations considered here is the convenient implementation of inversion in F 2 m : 
having available a 'free' squaring operation, the discussed technique by Itoh and Tsujii offers a viable alternative to 
Euclid's algorithm. It appears worthwhile to further explore the potential of different finite field representations for 
deriving quantum circuits that can, e. g., be used in connection with Shor's algorithm. From a cryptographic point of 
view, binary fields certainly play a prominent role, but, e.g., the discussion of Optimal Extension Fields by Bailey and 



l?i> = 10} - 

l?2> = |0) - 

Ifls) = |o> - 
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Paar [ 1 1 illustrates that finite fields of larger characteristic are of cryptographic interest as well, as they can facilitate 
efficient (classical) implementations. Exploring different representations of finite fields with odd characteristic appears 
to be a worthwhile endeavor for future work. 
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